402 research outputs found
Light Field Denoising via Anisotropic Parallax Analysis in a CNN Framework
Light field (LF) cameras provide perspective information of scenes by taking
directional measurements of the focusing light rays. The raw outputs are
usually dark with additive camera noise, which impedes subsequent processing
and applications. We propose a novel LF denoising framework based on
anisotropic parallax analysis (APA). Two convolutional neural networks are
jointly designed for the task: first, the structural parallax synthesis network
predicts the parallax details for the entire LF based on a set of anisotropic
parallax features. These novel features can efficiently capture the high
frequency perspective components of a LF from noisy observations. Second, the
view-dependent detail compensation network restores non-Lambertian variation to
each LF view by involving view-specific spatial energies. Extensive experiments
show that the proposed APA LF denoiser provides a much better denoising
performance than state-of-the-art methods in terms of visual quality and in
preservation of parallax details
Probabilistic-based Feature Embedding of 4-D Light Fields for Compressive Imaging and Denoising
The high-dimensional nature of the 4-D light field (LF) poses great
challenges in achieving efficient and effective feature embedding, that
severely impacts the performance of downstream tasks. To tackle this crucial
issue, in contrast to existing methods with empirically-designed architectures,
we propose a probabilistic-based feature embedding (PFE), which learns a
feature embedding architecture by assembling various low-dimensional
convolution patterns in a probability space for fully capturing spatial-angular
information. Building upon the proposed PFE, we then leverage the intrinsic
linear imaging model of the coded aperture camera to construct a
cycle-consistent 4-D LF reconstruction network from coded measurements.
Moreover, we incorporate PFE into an iterative optimization framework for 4-D
LF denoising. Our extensive experiments demonstrate the significant superiority
of our methods on both real-world and synthetic 4-D LF images, both
quantitatively and qualitatively, when compared with state-of-the-art methods.
The source code will be publicly available at
https://github.com/lyuxianqiang/LFCA-CR-NET
Unleash the Potential of 3D Point Cloud Modeling with A Calibrated Local Geometry-driven Distance Metric
Quantifying the dissimilarity between two unstructured 3D point clouds is a
challenging task, with existing metrics often relying on measuring the distance
between corresponding points that can be either inefficient or ineffective. In
this paper, we propose a novel distance metric called Calibrated Local Geometry
Distance (CLGD), which computes the difference between the underlying 3D
surfaces calibrated and induced by a set of reference points. By associating
each reference point with two given point clouds through computing its
directional distances to them, the difference in directional distances of an
identical reference point characterizes the geometric difference between a
typical local region of the two point clouds. Finally, CLGD is obtained by
averaging the directional distance differences of all reference points. We
evaluate CLGD on various optimization and unsupervised learning-based tasks,
including shape reconstruction, rigid registration, scene flow estimation, and
feature representation. Extensive experiments show that CLGD achieves
significantly higher accuracy under all tasks in a memory and computationally
efficient manner, compared with existing metrics. As a generic metric, CLGD has
the potential to advance 3D point cloud modeling. The source code is publicly
available at https://github.com/rsy6318/CLGD
Self-Supervised Pre-training for 3D Point Clouds via View-Specific Point-to-Image Translation
The past few years have witnessed the great success and prevalence of
self-supervised representation learning within the language and 2D vision
communities. However, such advancements have not been fully migrated to the
field of 3D point cloud learning. Different from existing pre-training
paradigms designed for deep point cloud feature extractors that fall into the
scope of generative modeling or contrastive learning, this paper proposes a
translative pre-training framework, namely PointVST, driven by a novel
self-supervised pretext task of cross-modal translation from 3D point clouds to
their corresponding diverse forms of 2D rendered images. More specifically, we
begin with deducing view-conditioned point-wise embeddings through the
insertion of the viewpoint indicator, and then adaptively aggregate a
view-specific global codeword, which can be further fed into subsequent 2D
convolutional translation heads for image generation. Extensive experimental
evaluations on various downstream task scenarios demonstrate that our PointVST
shows consistent and prominent performance superiority over current
state-of-the-art approaches as well as satisfactory domain transfer capability.
Our code will be publicly available at https://github.com/keeganhk/PointVST
Decoupling Dynamic Monocular Videos for Dynamic View Synthesis
The challenge of dynamic view synthesis from dynamic monocular videos, i.e.,
synthesizing novel views for free viewpoints given a monocular video of a
dynamic scene captured by a moving camera, mainly lies in accurately modeling
the dynamic objects of a scene using limited 2D frames, each with a varying
timestamp and viewpoint. Existing methods usually require pre-processed 2D
optical flow and depth maps by off-the-shelf methods to supervise the network,
making them suffer from the inaccuracy of the pre-processed supervision and the
ambiguity when lifting the 2D information to 3D. In this paper, we tackle this
challenge in an unsupervised fashion. Specifically, we decouple the motion of
the dynamic objects into object motion and camera motion, respectively
regularized by proposed unsupervised surface consistency and patch-based
multi-view constraints. The former enforces the 3D geometric surfaces of moving
objects to be consistent over time, while the latter regularizes their
appearances to be consistent across different viewpoints. Such a fine-grained
motion formulation can alleviate the learning difficulty for the network, thus
enabling it to produce not only novel views with higher quality but also more
accurate scene flows and depth than existing methods requiring extra
supervision
Accurate Light Field Depth Estimation with Superpixel Regularization over Partially Occluded Regions
Depth estimation is a fundamental problem for light field photography
applications. Numerous methods have been proposed in recent years, which either
focus on crafting cost terms for more robust matching, or on analyzing the
geometry of scene structures embedded in the epipolar-plane images. Significant
improvements have been made in terms of overall depth estimation error;
however, current state-of-the-art methods still show limitations in handling
intricate occluding structures and complex scenes with multiple occlusions. To
address these challenging issues, we propose a very effective depth estimation
framework which focuses on regularizing the initial label confidence map and
edge strength weights. Specifically, we first detect partially occluded
boundary regions (POBR) via superpixel based regularization. Series of
shrinkage/reinforcement operations are then applied on the label confidence map
and edge strength weights over the POBR. We show that after weight
manipulations, even a low-complexity weighted least squares model can produce
much better depth estimation than state-of-the-art methods in terms of average
disparity error rate, occlusion boundary precision-recall rate, and the
preservation of intricate visual features
Low-latency compression of mocap data using learned spatial decorrelation transform
Due to the growing needs of human motion capture (mocap) in movie, video
games, sports, etc., it is highly desired to compress mocap data for efficient
storage and transmission. This paper presents two efficient frameworks for
compressing human mocap data with low latency. The first framework processes
the data in a frame-by-frame manner so that it is ideal for mocap data
streaming and time critical applications. The second one is clip-based and
provides a flexible tradeoff between latency and compression performance. Since
mocap data exhibits some unique spatial characteristics, we propose a very
effective transform, namely learned orthogonal transform (LOT), for reducing
the spatial redundancy. The LOT problem is formulated as minimizing square
error regularized by orthogonality and sparsity and solved via alternating
iteration. We also adopt a predictive coding and temporal DCT for temporal
decorrelation in the frame- and clip-based frameworks, respectively.
Experimental results show that the proposed frameworks can produce higher
compression performance at lower computational cost and latency than the
state-of-the-art methods.Comment: 15 pages, 9 figure
Human Motion Capture Data Tailored Transform Coding
Human motion capture (mocap) is a widely used technique for digitalizing
human movements. With growing usage, compressing mocap data has received
increasing attention, since compact data size enables efficient storage and
transmission. Our analysis shows that mocap data have some unique
characteristics that distinguish themselves from images and videos. Therefore,
directly borrowing image or video compression techniques, such as discrete
cosine transform, does not work well. In this paper, we propose a novel
mocap-tailored transform coding algorithm that takes advantage of these
features. Our algorithm segments the input mocap sequences into clips, which
are represented in 2D matrices. Then it computes a set of data-dependent
orthogonal bases to transform the matrices to frequency domain, in which the
transform coefficients have significantly less dependency. Finally, the
compression is obtained by entropy coding of the quantized coefficients and the
bases. Our method has low computational cost and can be easily extended to
compress mocap databases. It also requires neither training nor complicated
parameter setting. Experimental results demonstrate that the proposed scheme
significantly outperforms state-of-the-art algorithms in terms of compression
performance and speed
- …